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The purpose of this study was to 
compare the area classification accuracy 
of each of the following options of image 
classification: 1. a pixel-by-pixel 
maximum likelihood gaussian classifier. 

2. a sample classifier based on B-distance 
(derived from the Bhattacharyya distance) . 

3. a sample classifier based on the 
generalized maximum likelihood approach. 

4. the pixel-by-pixel "single-cell 
signature acquisition" option of the 
Image-100 System. 5. same as option 1, but 
using the following simple decision rule 
for classification: if the percentage of 
pixels classified into the same class, 
within a given test field, exceeded a 
threshold value of 60%, they were all 
classified into the same class. 6. same as 
option 4, but using the decision rule 
given in option 5. 

LANDSAT multispectral scanner data o r 
the following three test sites of the 
state of Sao Paulo, Brazil, were 
classified using each of the above six 
options: 1. Sao Jose dos Campos 2. Ca- 
choeira Paulista 3. Jardinopolis. 

Considering both the errors of 
omission as well as commission, the sample 
classifier (option 2) yielded better 
classification accuracy, as compared to 
the maximum likelihood gaussian classifier 
(option 1) as well as single cell (option 
4). Options 5 and 6 considerably improved 
the classification accuracy of options 1 
and 4 respectively. 


I . INTRODUCTION 

The purpose of this study was to 
compare the results of area classification 
using plxel-by-pixel and sample 
classifiers applied to multispectral 
scanner (MSS) LANDSAT data. The following 
three test sites were selected for 
analysis in the state of Sao Paulo, Bra- 
zil: 1. Sao Jose dos Campos (23° 10’ S, 

45° 50'W). 2. Cachoeira Paulista (22° 40' 
S, 45°W). 3. Jardinopolis (21°S, 47° 50' 

W) . 


Cloud free multispectral scanner data 
from LANDSAT, of reasonable quality, over 
these three test sites were available. In 
addition, aerial photography and ground 
observations were available, to assist the 
data analysis. A short description of the 
above mentioned three test sites As given 
below: 1. Sao Jor.e dos Campos: Sao Jose 
dos Campos -was sel ctod because it is one 
of the faster.': growing small-size towns of 
Brazil and the authors are well ft-niliar 
with it. Many of the problems of this tcv/r, 
are similar to the problems of much larger 
urban centers. 2. Jardinopolis: It As one 
of the most important agricultural areas 
of the state of Sao Paulo. The principal 
crops in this area arc: corn, soybeans, 
cotton and sugar canes. The municipality 
of Jardinopolis has a population of about 
17,000 and an area of 552 km 2 . 3. Cachoei- 
ra Paulista: It is a small town situated 
approximately half way between two large 
cities, Sao Paulo and Rio de Janeiro. It 
has a population of 20 000 and an area of 
279 km 2 . A good part of this town is 
covered by pasture, while there is a small 
urban area including some of INPE's 
installations. 


A part of the work on Sao Jose dos Campos 
reported here was presented at the 
International Conference on Machine-aided 
Image Analysis, 4-G September, 1978, 
Oxford, Kngland. 


II. LITERATURE REVIEW 

Many investigators have analysed the 
multispectral scanner (MSS) data of 
LANDSAT satellite for appl' nations to land 


1979 /Machine Processing o( Remoiely Sensed Data Symposium 



use classification. For example, Todd and 
Baumgardner 1 (1973) analysed LANDSAT MSS 
data obtained over Marion County 
(Indianapolis), Indiana, by computer- 
implemented techniques to evaluate the 
utility of satellite data for urban land 
use classification. Several land use 
classes, such as commerce/industry, 
single-family (newer) residential, trees, 
and water exhibited spectrally separable 
characteristics and were identified with 
greater than 90 percent accuracy. Ellefson 
et al. 2 (1973) did computer-aided analysis 
of LANDSAT MSS data of the San Francisco 
Bay area. Smith ct al. 3 (1974) have given 
the application of spatial features to 
satellite land-use analysis. Ellefson et 
al. 1 * (1974) have given new techniques in 
mapping urban land use and monitoring 
change for selected U.S. metropolitan 
areas. They analysed LANDSAT MSS data 
using automatic pattern recognition 
techniques for classification. Kumar and 
Silva 5 (1977) have analysed the 
statistical separability of agricultural 
cover types in much detail, data quantity 
and depth in the subsets of one to twelve 
spectral channels. 

Cipra 7 (1974) compared multispectral 
imagery from LANDSAT to a soil association 
map of Tippecanoe County, Indiana, based 
on a conventional field survey. Hanuschak 8 
(1976) gave a technique for estimating 
crop acreage, utilizing LANDSAT imagery 
that is not cloud free. Aaronson 8 (1977) 
described the LANDSAT Agricultural 
Monitoring Program (LAMP)- to monitor 
Iowa's corn crop in near real-time. The 
program utilized LANDSAT data, in 
conjunction with collateral data sources, 
to monitor crop development and identify/ 
assess anomalies and crop stresses. 

Goldberg et al. 9 (1975) described 
methods and procedures which outside 
investigators may use, with the automated 
processing equipment of the Canada Centre 
for Remote Sensing (CCRS) , for the purpose 
of natural resource exploration and 
mapping. They have compared the accuracies 
of unsupervised and supervised methods, o. 
the basis of the confusion matrices 
generated by classifying exactly the same 
areas. 

III. METHOD OF ANALYSIS 

With the help of ground observations 
and aerial photography, a map of three 
test sites mentioned, shoving the 
following classes, was obtained: 1. Sao 
Jose dos Campos: residential, multi-family 
residential, commercial, industrial, 
agricultural and unoccupied. 2. Jardinopo- 
lis: sugar canes, vegetation, pasture and 


bare soil. 3. Cachoeira Paulista: 
constructed areas, water, bare soil r.r. 
agriculture . 

LANDSAT multispectral scanner data, 
on computer compatible tapes, of these 
three test sites were analysed using 
Image-100* . With the aid of aerial 
photography and ground observations, 
rectangular areas of each of the above 
mentioned classes of each of three test 
sites were selected, avoiding the 
boundaries of the respective classes, on 
the Image-100 display. The areas of each 
of these classes were selected carefully, 
so that they could be considered to be 
representative of the respective classes. 

Each of these classes was then 
divided into the following two independent 
groups: training and test areas. The 
purpose of this study was to compare the 
classification accuracy for the test areas 
of these test sites, using the training 
areas, for each of the following options 
of classification: 1. a pixcl-bv-pixel 
maximum likelihood Gaussian classifier. 

2. a sample classifier based on B-distanee 
(derived from the Bhattacharyya distance) . 

3. a sample classifier based on the 
generalized maximum likelihood approach 
(the probability distributions of the 
pixels within a sample were assumed to be 
independent). 4. the pixel-by-pixel 
"single-cell signature acquisition" option 
of the Image-100. 5. same as option 1, 
using the following simple decision rule 
for cltissi f ication : if the perc- ntage of 
pixels classified Into the same class 
within a given test field exceeded a 
certain user selected threshold value, for 
example 60%, they were all classified intc 
the same class. 6. same as option 4, using 
the decision rule given in optior 5. A 
brief explanation of options 1 to 4 is 
given below. 

Pixel-by-Pixel Maximum Likelihood 
Gaussian Classifier (MAXVER) : This system, 
developed at IN PE ' s Informatics Division, 
is avail able on-line-mode in the Image-100. 
In this system, the covariance matrix of 
each of the training classes is decomposed 
into an upper triangular and a lower 
triangular matrix. A maximum of 18 classes 
can be used. 

Sample Classifier Based on 
B-Distance: Assuming that each of the 
classes has a multivariate gaussian 


* Image-100 is a data processing system 
marketed by General Electric Co. to 
extract thematic information and enhance 
multispectral imagery. 
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distribution, the E-distance between two 
classes is given by 11 


3-2 (1 - e'°) , 


where 


|(U l -U 2 ) T z“ 1 (U,-U 2 ) + 


log. 


det Z 


^detl j • det£ 2 


] 


( 1 ) 


( 2 ) 


where Ui and U 2 are mean vectors of 
classes one and two respectively; whereas, 
l! and Z i are the covariance matrices of 
the same two classes, 


of the training areas in each channel. Fee 
examplc, in the case of Jardinopolis, 
using the training areas of vegetation, 
the number of pixels classified as 
"vegetation" by the 'single-cell option" 
inside the test fields of each of these 
four classes — sugar canes, vegetation, 
pasture and bare soil, was determined. An 
identical analysis was repeated for each 
of the other three classes — sugar canes, 
pasture and bare soil. Thus, a confusion 
matrix showing the total number of pixels 
(picture elements) of each class 
classified correctly as well as classified 
incorrectly into each of the other classes 
was obtained. Similarly, a confusion 
matrix was obtained for Sao Jose dos Cam- 
pos and for Cachoeira Paulista. 


I ** \ [tj + £ 2 ] (3) 

and T denotes transpose. 

The average B-distance over all pairs 
of classes is given by 

B AVK* Cl ,Cz ' ’ ’ ,C n^ 

, m-1 m 

- HTilmTT 1 r D(i,j|C,,C 2 ,..C ) (4) 

mim l) i=1 j=i+1 n 

where 

m - number of classes 

B(i,j|Ci , C 2 , .. . ,C) = B-distance 
between classes i and j in the 
channels Cj ,C 2 , . . . ,C . 

A sample classifier based on 
B-distance is available on-line-mode in 
the Image-100 12 ' 13 . The D-distancc is 
computed between a test field and each of 
the training classes and the test field is 
classified into the class for which the 
E-distance is minimum. Fields classified 
into the same class are stored in the 
same theme, to give them a distinct color. 

Sample Classifier Based on the 
Generalized Maximum Likelihood Approach: 
This classifier is available on-line- 
mode 114 in the Image-100. The maximum 
likelihood decision is based on the joint 
probability distributions of the pixels 
within a sample, assuming independence of 
the probability distributions of pixels 
within a sample. 

Pixel-by-Pixel .Single Cell Signature 
Acquisition Option of the Image-100: This 
option creates a four-dimensional 
rectangular parallelepiped, each side of 
which corresponds to the signature limits 


Unfortunately, due to lack of machine 
time, the following options of 
classification of these three test .sites, 
out of the six options mentioned above, 
could not be carried out: (1) Sao Jose dos 
Campos: option no. 3; (2) JardinSpolis: 
option no. 6; (3) Cachoeira Paulista: 
option no. 1, 3, 5 and G. 

In addition to these six options of 
classification, the effect of the size of 
training samples on the percentage of 
correct classification was investigated . 
Using 20% of the total area of each class 
for training, the three test sites were 
classified using option 2 as well as 
option 4. An identical analysis was done 
using 10% as well as 5% of the total area 
of each class for training, but using the 
.same test fields, to investigate the 
effect of size of the training samples on 
the percentage of correct lassif ication . 
This analysis was done for each of the 
three test sites, with the exception of 
classifying Sao Jose dos Campos using 
option 2, due to lack of time available. 

In the same' case of Sao Jose dos Cam- 
pos, B was computed for all possible 
subsets of one to four spectral 
channels, out of four available channels. 
For each value of* E-distance, the 
probability of correct classification was 
reasonably estimated from the curve of 
Swain and King 11 (1073). 

For Sao Jose dos Campos, in addition 
to the six options of classification 
mentioned earlier, the "multicell 
signature acquisition" as well as the 
"interactive acquisition" options of the 
Image-100 were used. In the multicell 
signature acquisition, the parallelepiped 
of spectra) signature is subdivided into 
cells, each of unit volume, and the number 
of pixels in each of these unit cells is 
counted. These coll counts are, thus, 
measures of Liu probability distribution 

iPRODUCIBILITY. OF i 
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of the spectral cluster. By raising or 
loweriny the threshold on the cell counts, 
one can vary the size of the four 
dimensional probability distribution of 
the spectral cluster by deleting or adding 
cells with counts greater than the 
variable threshold. In the interactive 
signature modification option, the user 
performs training on the misclassif ied 
area, adding the errors of omission and 
subtracting the errors of commission until 
satisfied with the results. 


IV. RESULTS AND DISCUSSION 
A. SKO JOSE DOS CAMPOS 

Table 1 gives the values of B ^ in 
all possible combinations of one, 
two, three and four channels out of the 
four available channels. As one would 
expect, the values of increase with 

an increase in the ' number of 

channels. In the subsets of one to three 
spectral channels, channel 4, channel 4 t 
7 (one in the visible and one in the near 
infrared), and channels 4, 5 & 7 (two in 
the visible and one in the near infrared) 
are found to be the best choices. Table 1 
shows that in the subset of two channels, 
channels 4 and 5 (visible wavelength 
region) give higher probability of correct 
classification than channels G & 7 (near 
infrared wavelength region) . The authors 
believe that each wavelength region — 
visible, near infrared, middle Infrared 
and thermal infrared, has independent 
information content. Thus, in the subset 
of two spectral channels, one channel in 
the visible and one channel in the near 
infrared wavelength region are found to be 
the best choice. Kumar (1978) has 
iinalysed aircraft-collected MSS data in 
much detail , data quantity and depth in 
the subsets of one to twelve spectral 
channels, to evaluate each spectral 
channel as well as possible combinations 
of wavelength regions for statistical 
separability of agricultural cover types. 

The errors of omission (for example, 
while using training fields of residential 
areas, number of pixels of test fields 
known to be residential, not classified as 
residential constitute the errors of 
omission, etc.) and the errors of 
commission (while using training fields of 
residential areas, number of pixels cf 
classes other than residential but which 
are classified by the Imagc-100 as 
residential) were calculated and are shown 
in Table 2. Similarly, the errors of 
omission and commission using the multi- 
cell signature acquisition (m=l , m=2 and 
m-3) , for the same training and test 
fields of each class were calculated and 


are given in Table 2. The option m«l 
means that all the unit cells in the f c • 
dimensional spectral space, which had la., 
than one pixel, were deleted from the 
spectral signature of the training fields 
for doing classification. Similarly, the 
option m=2 means that all the unit cells 
in the four dimensional spectral space 
which had less than two pixels were 
deleted from the spectral signature of the 
training fields for doing classification, 
etc. Table 2 shows that for the single- 
cell (option 4), the errors of omission 
vary from 16.3% for the class commercial 
to 33.3% for the class multifamily 
residential. The errors of commission vary 
from 5.6% for the class commercial to 
39.0% for the class industrial. This shows 
that the classification accuracy for ail 
the classes is rather poor, except the 
class "commercial", where the percentage 
of errors are reasonably sma. 1 (errors 
of omission *= 16 . 3% commission «* 5.6%). 
This is because of the small values of 
standard deviation for this class (and 
hence, less overlap with other classes) ir. 
each of the spectral channels, especially 
in the channels one (0.5 to 0.6 pm) and 
four (0.8 to 1.1 pm) . 

In general, an increase in the 
standard deviations of a class in the 
spectral channels tends to reduce the 
errors of omission and increase the errors 
of commission. It was found that, taking 
into account both the errors of omission 
as well as those of commission, the 
classification accuracy generally 
decreases with an increase in the standard 
deviations, as expected. 

Table 2 shows, as expected, that the 
multicell option increases the errors of 
omission and decreases the errors of 
commission. The multicell option for m«l 
considerably decreases the percentage of 
correct classification for each of the 
classes. This is because the number of 
pixels used for training in each class 
were relatively small for statistical 
purpose. Thus, the unit ceils in the four 
dimensional spectral space were sparsely 
populated. Thus, there may be many cells 
which are actually representative of the 
class, but do not have any pixels, because 
the total number of pixels for training 
for each of the classes was rather small. 
For the multicell option, the errors of 
omission increase and the errors cf 
commission decrease as wc go from m~l to 
m=2 and m-3. Considering the errors of 
omission as well as the errors of 
commission, the percentage of correct 
classification decreases as wo go from m=i 
to m=2 and r.t=3. 

Table 2 also shows that the 
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interactive signature acquisition option 
does not improve the classification 
accuracy, as compared to the "single cell" 
option, because of the overlap between the 
classes in the four-dimensional spectral 
space. It shows that considering both the 
errors of emission as well as conmission, 
the sample classifier (option 2) gave 
better classification accuracy, as 
compared to the pixcl-by-pixel classifier 
(option 1) as well as single cell (option 
4). Options 5 and 6 considerably improve 
the classification accuracy of options 1 
and 4 respectively. This is very 
encouraging, because using a simple 
decision rule in options 5 and 6 can 
considerably improve the classification 
accuracy. These results still need to be 
confirmed by a similar analysis of more • 
test sites. 

Table 2 also shows the effect of the 
size of training samples on the 
classification accuracy using the single 
cell (option 4). As one would expect, with 
the reduction in the size of training 
samples, the errors of omission increase, 
whereas the errors of commission decrease. 
Considering both errors of omission and 
commission, it seems that the percentage 
of correct classification decreases as the 
size of the training samples decreases. 
However , the cost of classifying the data 
increases with an increase in tne size of 
the training samples. Future studies will 
include a cost-benefit analysis to find an 
optimum trade off between cost of 
classification and size of training 
samples . 

B. CACHOEIRA PAULI STA 

Table 3 shows results obtained on the 
site of Cachoeira Paulista. It shows that 
the sample classifier (option 2) gives 
much better classification accuracy, as 
compared to the single cell (option 4). In 
addition, it shows that, considering 
errors of omission as well as commission, 
the percentage of correct classification 
decreases as the size of the training 
samples decreases, for the single cell 
option as v/ell as the sample classifier. 

It can be seen that bare soil has large 
errors of omission, whereas constructed 
area has large errors of commission. This 
is because the class "constructed area" 
had a large standard deviation and 
considerable part of the interval of 
spectral response of bare soil was within 
that of constructed area. 

C. JARD1N0POLIS 

Table 4 shows the errors of omission 
and commission for the municipality of 
Jardinopolis. It shows that options 1, 2, 


3 and 5 give considerably higher 
percentage of correct classification, 
compared to option 4. In addition, it 
shows, as one would expect, that the 
errors of omission increase, whereas the 
errors of commission decrease with a 
decrease of size of the training samples. 
However, even when the training area 
constitutes 20% of tha total (training + 
test) area, the errors of commission are 
much smaller than the respective errors of 
omission. Thus, the authors believe that, 
in this particular case, the sizes of the 
training samples constituting 5% cr even 
10% of the total area are not adequate for 
achieving a reasonable percentage of 
correct classification, using option 4. 
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. = Residential, Con. = Commercial, Aar. = .agricultural, Unoc. = Unoccupied, M.Res. = Multifanily 
idential, Ind. = Industrial. 


Table 3. Percentage Errors of Omission and Commission (Cachc-eira Paulista) . 
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